Signaling Scalability Information in a Parameter Set

ABSTRACT

A system for decoding a video bitstream includes receiving a frame of the video that includes at least one slice and at least one tile and where each of the at least one slice and the at least one tile are not all aligned with one another.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application claims the benefit of U.S. Provisional Pat. App. No.61/728,997, filed Nov. 21, 2012.

BACKGROUND OF THE INVENTION

The present invention relates to video encoding and decoding.

Electronic devices have become smaller and more powerful in order tomeet consumer needs and to improve portability and convenience.Consumers have become dependent upon electronic devices and have come toexpect increased functionality Some examples of electronic devicesinclude desktop computers, laptop computers, cellular phones, smartphones, media players, integrated circuits, etc.

Some electronic devices are used for processing and/or displayingdigital media. For example, portable electronic devices now allow fordigital media to be produced and/or consumed at almost any locationwhere a consumer may be. Furthermore, some electronic devices mayprovide download or streaming of digital media content for the use andenjoyment of a consumer.

Digital video is typically represented as a series of images or frames,each of which contains an array of pixels. Each pixel includesinformation, such as intensity and/or color information. In many cases,each pixel is represented as a set of three colors. Some video codingtechniques provide higher coding efficiency at the expense of increasingcomplexity. Increasing image quality requirements and increasing imageresolution requirements for video coding techniques also increase thecoding complexity.

The increasing popularity of digital media has presented severalproblems. For example, efficiently representing high-quality digitalmedia for storage, transmittal, and playback presents severalchallenges. Techniques that represent digital media more efficiently isbeneficial.

The foregoing and other objectives, features, and advantages of theinvention will be more readily understood upon consideration of thefollowing detailed description of the invention, taken in conjunctionwith the accompanying drawings.

BRIEF DESCRIPTION OF THE SEVERAL VIEWS OF THE DRAWINGS

FIG. 1 is a block diagram illustrating one configuration of anelectronic device including a HEVC encoder.

FIG. 2 is a block diagram illustrating one configuration of anelectronic device including a HEVC decoder.

FIG. 3 is a block diagram illustrating one example of an encoder and adecoder.

FIG. 4 illustrates various components that may be utilized in anelectronic device.

FIG. 5 illustrates an exemplary slice structure.

FIG. 6 illustrates another exemplary slice structure.

FIG. 7 illustrates a frame with a slice and 9 tiles.

FIG. 8 illustrates a frame with three slices and 3 tiles.

FIGS. 9A-9C illustrates different NAL Unit header syntax.

FIG. 10 illustrates a general NAL Unit syntax.

FIG. 11 illustrates an existing video parameter set.

FIG. 12 illustrates existing scalability types.

FIG. 13 illustrates an exemplary video parameter set.

FIG. 14 illustrates an exemplary scalability map syntax.

FIG. 15 illustrates an exemplary video parameter set.

FIG. 16 illustrates an existing video parameter set.

FIG. 17 illustrates an existing dimension type, dimension id syntax.

FIG. 18 illustrates an exemplary video parameter set.

FIG. 19 illustrates an exemplary scalability map syntax.

FIG. 20 illustrates an exemplary video parameter set.

FIG. 21 illustrates an exemplary video parameter set.

FIG. 22 illustrates an exemplary video parameter set.

FIG. 23 illustrates an exemplary source scan type information indicatorsyntax.

FIG. 24 illustrates an exemplary source information indictor syntax.

FIG. 25 illustrates an exemplary video parameter set.

FIG. 26 illustrates an exemplary video parameter set.

FIG. 27 illustrates an exemplary source scan type information indicatorsyntax.

FIG. 28 illustrates an exemplary source information indicator syntax

DETAILED DESCRIPTION OF PREFERRED EMBODIMENT

The Joint Collaborative Team on Video Coding (JCT-VC) of theInternational Telecommunication Union Telecommunication StandardizationSector (ITU-T) Study Group 16 (SG16) Working Party 3 (WP3) andInternational Organization for Standardization/InternationalElectrotechnical Commission (ISO/IEC) Joint Technical Committee1/Subcommittee 29/Working Group 11 (JTC1/SC29/WG11) has launched astandardization effort for a video coding standard called the HighEfficiency Video Coding standard (HEVC). HEVC uses block-based coding.

In HEVC, an entropy coding technique Context-Adaptive Binary ArithmeticCoding CABAC)) is used to compress Transformed and QuantizedCoefficients (TQCs) without loss. TQCs may be from different block sizesaccording to transform sizes (e.g., 4×4, 8×8, 16×16, 32×32).

Two-dimensional (2D) TQCs may be converted into a one-dimensional (1D)array before entropy coding. In one example, 2D arrayed TQCs in a 4×4block may be arranged as illustrated in Table (1).

TABLE (1) 4 0 1 0 3 2 −1 . . . −3 0 . . . . . . 0 . . . . . . . . .

When converting the 2D TQCs into a 1D array, the block may be scanned ina diagonal zig-zag fashion. Continuing with the example, the 2D arrayedTQCs illustrated in Table (1) may be converted into 1D arrayed TQCs [4,0, 3, −3, 2, 1, 0, −1, 0, . . . ] by scanning the first row and firstcolumn, first row and second column, second row and first column, thirdrow and first column, second row and second column, first row and thirdcolumn, first row and fourth column, second row and third column, thirdrow and second column, fourth row and first column and so on.

The coding procedure in HEVC may proceed, for example, as follows. TheTQCs in the 1D array may be ordered according to scanning position. Thescanning position of the last significant coefficient and the lastcoefficient level may be determined. The last significant coefficientmay be coded. It should be noted that coefficients are typically codedin reverse scanning order. Run-level coding may be performed, whichencodes information about runs of identical numbers and/or bits ratherthan encoding the numbers themselves, which is activated directly afterthe last coefficient coding. Then, level coding may be performed. Theterm significant coefficient refers to a coefficient that has acoefficient level value that is greater than zero. A coefficient levelvalue refers to a unique indicator of the magnitude (or absolute value)of a Transformed and Quantized Coefficient (TQC) value.

This procedure may be illustrated in Table (2) as a continuation of theexample above (with the 1D arrayed TQCs [4, 0, 3, −3, 2, 1, 0, −1, 0, .. . ]).

TABLE (2) Scanning Position 0 1 2 3 4 5 6 7 . . . Coefficient Level 4 03 −3 2 1 0 −1 . . . Last Position 7 Last Coefficient Level −1 Run-LevelCoding 2 1 0 Level Coding 4 0 3 −3

In Table (2), for example, the coefficient level −1 at scanning position7 may be the last non-zero coefficient. Thus, the last position isscanning position 7 and the last coefficient level is −1. Run-levelcoding may be performed for coefficients 0, 1 and 2 at scanningpositions 6, 5 and 4 (where coefficients are coded in reverse scanningorder). Then, level coding may be performed for the coefficient levels−3, 3, 0 and 4.

FIG. 1 is a block diagram illustrating one configuration of anelectronic device 102 in which video may be coded. It should be notedthat one or more of the elements illustrated as included within theelectronic device 102 may be implemented in hardware, software, or acombination of both. For example, the electronic device 102 includes aencoder 108, which may be implemented in hardware, software or acombination of both. For instance, the encoder 108 may be implemented asa circuit, integrated circuit, application-specific integrated circuit(ASIC), processor in electronic communication with memory withexecutable instructions, firmware, field-programmable gate array (FPGA),etc., or a combination thereof. In some configurations, the encoder 108may be a high efficiency video coding (HEVC) coder.

The electronic device 102 may include a supplier 104. The supplier 104may provide picture or image data (e.g., video) as a source 106 to theencoder 108. Examples of the supplier 104 include image sensors, memory,communication interfaces, network interfaces, wireless receivers, ports,etc.

The source 106 may be provided to an intra-frame prediction module andreconstruction buffer 110. The source 106 may also be provided to amotion estimation and motion compensation module 136 and to asubtraction module 116.

The intra-frame prediction module and reconstruction buffer 110 maygenerate intra mode information 128 and an intra signal 112 based on thesource 106 and reconstructed data 150. The motion estimation and motioncompensation module 136 may generate inter mode information 138 and aninter signal 114 based on the source 106 and a reference picture buffer166 signal 168. The reference picture buffer 166 signal 168 may includedata from one or more reference pictures stored in the reference picturebuffer 166.

The encoder 108 may select between the intra signal 112 and the intersignal 114 in accordance with a mode. The intra signal 112 may be usedin order to exploit spatial characteristics within a picture in an intracoding mode. The inter signal 114 may be used in order to exploittemporal characteristics between pictures in an inter coding mode. Whilein the intra coding mode, the intra signal 112 may be provided to thesubtraction module 116 and the intra mode information 128 may beprovided to an entropy coding module 130. While in the inter codingmode, the inter signal 114 may be provided to the subtraction module 116and the inter mode information 138 may be provided to the entropy codingmodule 130.

Either the intra signal 112 or the inter signal 114 (depending on themode) is subtracted from the source 106 at the subtraction module 116 inorder to produce a prediction residual 118. The prediction residual 118is provided to a transformation module 120. The transformation module120 may compress the prediction residual 118 to produce a transformedsignal 122 that is provided to a quantization module 124. Thequantization module 124 quantizes the transformed signal 122 to producetransformed and quantized coefficients (TQCs) 126.

The TQCs are provided to an entropy coding module 130 and an inversequantization module 140. The inverse quantization module 140 performsinverse quantization on the TQCs to produce an inverse quantized signal142 that is provided to an inverse transformation module 144. Theinverse transformation module 144 decompresses the inverse quantizedsignal 142 to produce a decompressed signal 146 that is provided to areconstruction module 148.

The reconstruction module 148 may produce reconstructed data 150 basedon the decompressed signal 146. For example, the reconstruction module148 may reconstruct (modified) pictures. The reconstructed data 150 maybe provided to a deblocking filter 152 and to the intra predictionmodule and reconstruction buffer 110. The deblocking filter 152 mayproduce a filtered signal 154 based on the reconstructed data 150.

The filtered signal 154 may be provided to a sample adaptive offset(SAO) module 156. The SAO module 156 may produce SAO information 158that is provided to the entropy coding module 130 and an SAO signal 160that is provided to an adaptive loop filter (ALF) 162. The ALF 162produces an ALF signal 164 that is provided to the reference picturebuffer 166. The ALF signal 164 may include data from one or morepictures that may be used as reference pictures. In some cases the ALF162 may be omitted.

The entropy coding module 130 may code the TQCs to produce a bitstream134. As described above, the TQCs may be converted to a 1D array beforeentropy coding. Also, the entropy coding module 130 may code the TQCsusing CAVLC or CABAC. In particular, the entropy coding module 130 maycode the TQCs 126 based on one or more of intra mode information 128,inter mode information 138 and SAO information 158. The bitstream 134may include coded picture data.

Quantization, involved in video compression such as HEVC, is a lossycompression technique achieved by compressing a range of values to asingle quantum value. The quantization parameter (QP) is a predefinedscaling parameter used to perform the quantization based on both thequality of reconstructed video and compression ratio. The block type isdefined in HEVC to represent the characteristics of a given block basedon the block size and its color information. QP, resolution informationand block type may be determined before entropy coding. For example, theelectronic device 102 (e.g., the encoder 108) may determine the QP,resolution information and block type, which may be provided to theentropy coding module 130.

The entropy coding module 130 may determine the block size based on ablock of TQCs 126. For example, block size may be the number of TQCsalong one dimension of the block of TQCs. In other words, the number ofTQCs in the block of TQCs may be equal to block size squared. Inaddition, the block may be non-square where the number of TQCs is theheight times the width of the block. For instance, block size may bedetermined as the square root of the number of TQCs in the block ofTQCs. Resolution may be defined as a pixel width by a pixel height.Resolution information may include a number of pixels for the width of apicture, for the height of a picture or both. Block size may be definedas the number of TQCs along one dimension of a 2D block of TQCs.

In some configurations, the bitstream 134 may be transmitted to anotherelectronic device. For example, the bitstream 134 may be provided to acommunication interface, network interface, wireless transmitter, port,etc. For instance, the bitstream 134 may be transmitted to anotherelectronic device via a Local Area Network (LAN), the Internet, acellular phone base station, etc. The bitstream 134 may additionally oralternatively be stored in memory on the electronic device 102.

FIG. 2 is a block diagram illustrating one configuration of anelectronic device 270 including a decoder 272 that may be ahigh-efficiency video coding (HEVC) decoder. The decoder 272 and one ormore of the elements illustrated as included in the decoder 272 may beimplemented in hardware, software or a combination of both. The decoder272 may receive a bitstream 234 (e.g., one or more coded picturesincluded in the bitstream 234) for decoding. In some configurations, thereceived bitstream 234 may include received overhead information, suchas a received slice header, received picture parameter set (PPS),received buffer description information, classification indicator, etc.

Received symbols (e.g., encoded TQCs) from the bitstream 234 may beentropy decoded by an entropy decoding module 274. This may produce amotion information signal 298 and decoded transformed and quantizedcoefficients (TQCs) 278.

The motion information signal 298 may be combined with a portion of adecoded picture 292 from a frame memory 290 at a motion compensationmodule 294, which may produce an inter-frame prediction signal 296. Thedecoded transformed and quantized coefficients (TQCs) 278 may be inversequantized and inverse transformed by an inverse quantization and inversetransformation module 280, thereby producing a decoded residual signal282. The decoded residual signal 282 may be added to a prediction signal205 by a summation module 207 to produce a combined signal 284. Theprediction signal 205 may be a signal selected from either theinter-frame prediction signal 296 produced by the motion compensationmodule 294 or an intra-frame prediction signal 203 produced by anintra-frame prediction module 201. In some configurations, this signalselection may be based on (e.g., controlled by) the bitstream 234.

The intra-frame prediction signal 203 may be predicted from previouslydecoded information from the combined signal 284 (in the current frame,for example). The combined signal 284 may also be filtered by adeblocking filter 286. The resulting filtered signal 288 may be providedto a sample adaptive offset (SAO) module 231. Based on the filteredsignal 288 and information 239 from the entropy decoding module 274, theSAO module 231 may produce an SAO signal 235 that is provided to anadaptive loop filter (ALF) 233. The ALF 233 produces an ALF signal 237that is provided to the frame memory 290. The ALF signal 237 may includedata from one or more pictures that may be used as reference pictures.The ALF signal 237 may be written to frame memory 290. The resulting ALFsignal 237 may include a decoded picture. In some cases the ALF 233 maybe omitted.

The frame memory 290 may include a decoded picture buffer (DPB). Theframe memory 290 may also include overhead information corresponding tothe decoded pictures. For example, the frame memory 290 may includeslice headers, picture parameter set (PPS) information, cycleparameters, buffer description information, etc. One or more of thesepieces of information may be signaled from a coder (e.g., encoder 108).

The frame memory 290 may provide one or more decoded pictures 292 to themotion compensation module 294. Furthermore, the frame memory 290 mayprovide one or more decoded pictures 292, which may be output from thedecoder 272. The one or more decoded pictures 292 may be presented on adisplay, stored in memory or transmitted to another device, for example.

FIG. 3 is a block diagram illustrating one example of an ecoder 308 anda decoder 372. In this example, electronic device A 302 and electronicdevice B 370 are illustrated. However, it should be noted that thefeatures and functionality described in relation to electronic device A302 and electronic device B 370 may be combined into a single electronicdevice in some configurations.

Electronic device A 302 includes the encoder 308. The encoder 308 may beimplemented in hardware, software or a combination of both. In oneconfiguration, the encoder 308 may be a high-efficiency video coding(HEVC) coder. Other coders may likewise be used. Electronic device A 302may obtain a source 306. In some configurations, the source 306 may becaptured on electronic device A 302 using an image sensor, retrievedfrom memory or received from another electronic device.

The encoder 308 may code the source 306 to produce a bitstream 334. Forexample, the encoder 308 may code a series of pictures (e.g., video) inthe source 306. The encoder 308 may be similar to the encoder 108described above in connection with FIG. 1.

The bitstream 334 may include coded picture data based on the source306. In some configurations, the bitstream 334 may also include overheaddata, such as slice header information, PPS information, etc. Asadditional pictures in the source 306 are coded, the bitstream 334 mayinclude one or more coded pictures.

The bitstream 334 may be provided to the decoder 372. In one example,the bitstream 334 may be transmitted to electronic device B 370 using awired or wireless link. In some cases, this may be done over a network,such as the Internet or a Local Area Network (LAN). As illustrated inFIG. 3, the decoder 372 may be implemented on electronic device B 370separately from the encoder 308 on electronic device A 302. However, itshould be noted that the encoder 308 and decoder 372 may be implementedon the same electronic device in some configurations. In animplementation where the encoder 308 and decoder 372 are implemented onthe same electronic device, for instance, the bitstream 334 may beprovided over a bus to the decoder 372 or stored in memory for retrievalby the decoder 372.

The decoder 372 may be implemented in hardware, software or acombination of both. In one configuration, the decoder 372 may be ahigh-efficiency video coding (HEVC) decoder. Other decoders may likewisebe used. The decoder 372 may be similar to the decoder 272 describedabove in connection with FIG. 2.

FIG. 4 illustrates various components that may be utilized in anelectronic device 409. The electronic device 409 may be implemented asone or more of the electronic devices. For example, the electronicdevice 409 may be implemented as the electronic device 102 describedabove in connection with FIG. 1, as the electronic device 270 describedabove in connection with FIG. 2, or both.

The electronic device 409 includes a processor 417 that controlsoperation of the electronic device 409. The processor 417 may also bereferred to as a CPU. Memory 411, which may include both read-onlymemory (ROM), random access memory (RAM) or any type of device that maystore information, provides instructions 413 a (e.g., executableinstructions) and data 415 a to the processor 417. A portion of thememory 411 may also include non-volatile random access memory (NVRAM).The memory 411 may be in electronic communication with the processor417.

Instructions 413 b and data 415 b may also reside in the processor 417.Instructions 413 b and/or data 415 b loaded into the processor 417 mayalso include instructions 413 a and/or data 415 a from memory 411 thatwere loaded for execution or processing by the processor 417. Theinstructions 413 b may be executed by the processor 417 to implement oneor more techniques disclosed herein.

The electronic device 409 may include one or more communicationinterfaces 419 for communicating with other electronic devices. Thecommunication interfaces 419 may be based on wired communicationtechnology, wireless communication technology, or both. Examples ofcommunication interfaces 419 include a serial port, a parallel port, aUniversal Serial Bus (USB), an Ethernet adapter, an IEEE 1394 businterface, a small computer system interface (SCSI) bus interface, aninfrared (IR) communication port, a Bluetooth wireless communicationadapter, a wireless transceiver in accordance with 3^(rd) GenerationPartnership Project (3GPP) specifications and so forth.

The electronic device 409 may include one or more output devices 423 andone or more input devices 421. Examples of output devices 423 include aspeaker, printer, etc. One type of output device that may be included inan electronic device 409 is a display device 425. Display devices 425used with configurations disclosed herein may utilize any suitable imageprojection technology, such as a cathode ray tube (CRT), liquid crystaldisplay (LCD), light-emitting diode (LED), gas plasma,electroluminescence or the like. A display controller 427 may beprovided for converting data stored in the memory 411 into text,graphics, and/or moving images (as appropriate) shown on the display425. Examples of input devices 421 include a keyboard, mouse,microphone, remote control device, button, joystick, trackball,touchpad, touchscreen, lightpen, etc.

The various components of the electronic device 409 are coupled togetherby a bus system 429, which may include a power bus, a control signal busand a status signal bus, in addition to a data bus. However, for thesake of clarity, the various buses are illustrated in FIG. 4 as the bussystem 429. The electronic device 409 illustrated in FIG. 4 is afunctional block diagram rather than a listing of specific components.

The term “computer-readable medium” refers to any available medium thatcan be accessed by a computer or a processor. The term“computer-readable medium,” as used herein, may denote a computer-and/or processor-readable medium that is non-transitory and tangible. Byway of example, and not limitation, a computer-readable orprocessor-readable medium may comprise RAM, ROM, EEPROM, CD-ROM or otheroptical disk storage, magnetic disk storage or other magnetic storagedevices, or any other medium that can be used to carry or store desiredprogram code in the form of instructions or data structures and that canbe accessed by a computer or processor. Disk and disc, as used herein,includes compact disc (CD), laser disc, optical disc, digital versatiledisc (DVD), floppy disk and Blu-ray® disc where disks usually reproducedata magnetically, while discs reproduce data optically with lasers. Thecode for the decoder and/or encoder may be stored on a computer readablemedium.

An input picture comprising a plurality of coded tree blocks (e.g.,generally referred to herein as blocks) may be partitioned into one orseveral slices. The values of the samples in the area of the picturethat a slice represents may be properly decoded without the use of datafrom other slices provided that the reference pictures used at theencoder and the decoder are the same and that de-blocking filtering doesnot use information across slice boundaries. Therefore, entropy decodingand block reconstruction for a slice does not depend on other slices. Inparticular, the entropy coding state may be reset at the start of eachslice. The data in other slices may be marked as unavailable whendefining neighborhood availability for both entropy decoding andreconstruction. The slices may be entropy decoded and reconstructed inparallel. No intra prediction and motion-vector prediction is preferablyallowed across the boundary of a slice. In contrast, de-blockingfiltering may use information across slice boundaries.

FIG. 5 illustrates an exemplary video picture 500 comprising elevenblocks in the horizontal direction and nine blocks in the verticaldirection (nine exemplary blocks labeled 501-509). FIG. 5 illustratesthree exemplary slices: a first slice denoted “SLICE #0” 520, a secondslice denoted “SLICE #1” 530 and a third slice denoted “SLICE #2” 540.The decoder may decode and reconstruct the three slices 520, 530, 540,in parallel. Each of the slices may be transmitted in scan line order ina sequential manner. At the beginning of the decoding/reconstructionprocess for each slice, context models are initialized or reset andblocks in other slices are marked as unavailable for both entropydecoding and block reconstruction. The context model generallyrepresents the state of the entropy encoder and/or decoder. Thus, for ablock, for example, the block labeled 503, in “SLICE #1”, blocks (forexample, blocks labeled 501 and 502) in “SLICE #0” may not be used forcontext model selection or reconstruction. Whereas, for a block, forexample, the block labeled 505, in “SLICE #1,” other blocks (forexample, blocks labeled 503 and 504) in “SLICE #1” may be used forcontext model selection or reconstruction. Therefore, entropy decodingand block reconstruction proceeds serially within a slice. Unless slicesare defined using a flexible block ordering (FMO), blocks within a sliceare processed in the order of a raster scan.

FIG. 6 depicts an exemplary block allocation into three slice groups: afirst slice group denoted “SLICE GROUP #0” 550, a second slice groupdenoted “SLICE GROUP #1” 560 and a third slice group denoted “SLICEGROUP #2” 570. These slice groups 550, 560, 570, may be associated withtwo foreground regions and a background region, respectively, in thepicture 580.

The arrangement of slices, as illustrated in FIG. 5, may be limited todefining each slice between a pair of blocks in the image scan order,also known as raster scan or a raster scan order. This arrangement ofscan order slices is computationally efficient but does not tend to lenditself to the highly efficient parallel encoding and decoding. Moreover,this scan order definition of slices also does not tend to group smallerlocalized regions of the image together that are likely to have commoncharacteristics highly suitable for coding efficiency. The arrangementof slices, as illustrated in FIG. 6, is highly flexible in itsarrangement but does not tend to lend itself to high efficient parallelencoding or decoding. Moreover, this highly flexible definition ofslices is computationally complex to implement in a decoder.

Referring to FIG. 7, a tile technique divides an image into a set ofrectangular (inclusive of square) regions. The blocks (alternativelyreferred to as largest coding units or coded treeblocks in some systems)within each of the tiles are encoded and decoded in a raster scan order.The arrangement of tiles are likewise encoded and decoded in a rasterscan order. Accordingly, there may be any suitable number of columnboundaries (e.g., 0 or more) and there may be any suitable number of rowboundaries (e.g., 0 or more). Thus, the frame may define one or moreslices, such as the one slice illustrated in FIG. 7. In someembodiments, blocks located in different tiles are not available forintra-prediction, motion compensation, entropy coding context selectionor other processes that rely on neighboring block information.

Referring to FIG. 8, the tile technique is shown dividing an image intoa set of three rectangular columns. The blocks (alternatively referredto as largest coding units or coded treeblocks in some systems) withineach of the tiles are encoded and decoded in a raster scan order. Thetiles are likewise encoded and decoded in a raster scan order. One ormore slices may be defined in the scan order of the tiles. Each of theslices are independently decodable. For example, slice 1 may be definedas including blocks 1-9, slice 2 may be defined as including blocks10-28, and slice 3 may be defined as including blocks 29-126 which spansthree tiles. The use of tiles facilitates coding efficiency byprocessing data in more localized regions of a frame.

It is to be understood that in some cases the video coding mayoptionally not include tiles, and may optionally include the use of awavefront encoding/decoding pattern for the frames of the video. In thismanner, one or more lines of the video (such as a plurality of groups ofone or more rows of macroblocks (or alternatively coded tree blocks),each of which group being representative of a wavefront substream may beencoded/decoded in a parallel fashion. In general, the partitioning ofthe video may be constructed in any suitable manner.

Video coding standards often compress video data for transmission over achannel with limited frequency bandwidth and/or limited storagecapacity. These video coding standards may include multiple codingstages such as intra prediction, transform from spatial domain tofrequency domain, quantization, entropy coding, motion estimation, andmotion compensation, in order to more effectively encode and decodeframes. Many of the coding and decoding stages are undulycomputationally complex.

Various scalable video coding techniques have been developed. Inscalable video coding a primary bit stream (generally referred to as thebase layer bitstream) is received by a decoder. In addition, the decodermay receive one or more secondary bitstream(s) (generally referred to asenhancement layer(s)). The function of each enhancement layer may be: toimprove the quality of the base layer bitstream; to improve the framerate of the base layer bitstream; and/or to improve the pixel resolutionof the base layer bitstream. Quality scalability is also referred to asSignal-to-Noise Ratio (SNR) scalability. Frame rate scalability is alsoreferred to as temporal scalability. Resolution scalability is alsoreferred to as spatial scalability.

Enhancement layer(s) can change other features of the base layerbitstream. For example, an enhancement layer can be associated with adifferent aspect ratio and/or viewing angle than the base layer. Anotheraspect of enhancement layers is that the base layer and an enhancementlayer may correspond to different video coding standards, e.g. the baselayer may be MPEG-2 (Motion Pictures Experts Group 2) and an enhancementlayer may be HEVC-Ext (High Efficiency Video Coding extension).

An ordering may be defined between layers. For example:

Base layer (lowest) [layer 0]

Enhancement layer 0 [layer 1]

Enhancement layer 1 [layer 2]

Enhancement layer n (highest) [layer n+1]

The enhancement layer(s) may have dependency on one another (in anaddition to the base layer). In an example, enhancement layer 2 isusable only if at least a portion of enhancement layer 1 has been parsedand/or reconstructed successfully (and if at least a portion of the baselayer has been parsed and/or reconstructed successfully).

The bitstream of the coded video may include a syntax structure that isplaced into logical data packets generally referred to as NetworkAbstraction.Layer (NAL) units. Each NAL unit includes a NAL unit header,such as a two-byte NAL unit header (e.g., 16 bits), to identify thepurpose of the associated data payload. For example, each coded slice(and/or picture) may be coded in one or more slice (and/or picture) NALunits. Other NAL units may be included for other categories of data,such as for example, supplemental enhancement information, coded sliceof temporal sub-layer access (TSA) picture, coded slice of step-wisetemporal sub-layer access (STSA) picture, coded slice a non-TSA,non-STSA trailing picture, coded slice of broken link access picture,coded slice of instantaneous decoded refresh picture, coded slice ofclean random access picture, coded slice of decodable leading picture,coded slice of tagged for discard picture, video parameter set, sequenceparameter set, picture parameter set, access unit delimiter, end ofsequence, end of bitstream, filler data, and/or sequence enhancementinformation message. Other NAL unit types may be included, as desired.

A random access point picture (RAP) picture contains only I slices andmay be a broken link access (BLA) picture, a clean random access (CRA)picture, or an instantaneous decoding refresh (IDR) picture. The firstpicture in the bitstream is a RAP picture.

A broken link access picture (BLA) picture is one type of RAP picture. ABLA picture contains only I slices, and may be the first picture in thebitstream in decoding order, or may appear later in the bitstream. EachBLA picture begins a new coded video sequence, and has the same effecton the decoding process as an IDR picture. However, a BLA picturecontains syntax elements that, if it had been CRA picture instead, wouldspecify a non-empty reference picture set. When a BLA picture isencountered in a bitstream, these syntax elements are ignored and thereference picture set is instead initialized as an empty set.

A clean random access (CRA) picture is one type of RAP picture. A CRApicture contains only I slices, and may be the first picture in thebitstream in decoding order, or may appear later in the bitstream. A CRApicture may have associated decodable leading pictures (DLP) and Taggedfor discard (TFD) pictures.

An instantaneous decoding refresh (IDR) picture is a type of RAPpicture. An IDR picture contains only I slices, and may be the firstpicture in the bitstream in decoding order, or may appear later in thebitstream. Each IDR picture is the first picture of a coded videosequence in decoding order.

A decodable leading picture (DLP) are leading pictures. DLP pictures arenot used as reference pictures for the decoding process of trailingpictures of the same associated RAP picture.

A tagged for discard (TFD) picture are leading pictures of an associatedBLA or CRA picture. When the associated RAP picture is a BLA picture oris the first coded picture in the bitstream, the TFD picture is notoutput and may not be correctly decodable, as the TFD picture maycontain references to reference pictures that are not present in thebitstream.

A leading picture is a picture that precedes the associated RAP picturein output order.

A trailing picture is a picture that follows the associated RAP picturein output order.

The NAL unit provides the capability to map the video coding layer (VCL)data that represents the content of the pictures onto various transportlayers. The NAL units may be classified into VCL and non-VCL NAL unitsaccording to whether they contain coded picture or other associateddata, respectively. B. Bros, W-J. Han, J-R. Ohm, G. J. Sullivan, and T-.Wiegand, “High efficiency video coding (HEVC) text specification draft8,” JCTVC-J10003, Stockholm, July 2012; “BoG on high-level syntax forextension planning”, Ye-Kui Wang, JCTVC-J00574, July 2012; and “BoG onhigh-level syntax for extension planning”, Ye-Kui Wang, JCTVC-J00574r1,July 2012, are hereby incorporated by reference herein in theirentirety.

Referring to FIG. 9A, the NAL unit header syntax may include two bytesof data, namely, 16 bits. The first bit is a “forbidden zero bit” whichis always set to zero at the start of a NAL unit. The next six bits is a“nal_unit_type” which specifies the type of raw byte sequence payloads(“RBSP”) data structure contained in the NAL unit. The next 6 bits is a“nuh_reserved_zero_6bits”. The nuh_reserved_zero 6bits may be equal to 0in the base specification of the standard. Other values ofnuh_reserved_zero_6bits may be specified as desired. Decoders may ignore(i.e., remove from the bitstream and discard) all NAL units with valuesof nuh_reserved_zero_6bits not equal to 0 when handling a stream basedon the base specification of the standard. In a scalable or otherextension nuh_reserved_zero 6bits may specify other values, to signalscalable video coding and/or syntax extensions. In some cases syntaxelement nuh_reserved_zero_6bits may be called reserved_zero 6bits. Insome cases the syntax element nuh_reserved_zero 6bits may be called aslayer_id_plus1 or layer_id, as illustrated in FIG. 9B and FIG. 9C. Inthis case the element layer_id will be layer_id_plus1 minus 1. In thiscase it may be used to signal information related to layer of scalablecoded video. The next syntax element is “nuh_temporal_id_plus1”.nuh_temporal_id_plus1 minus 1 may specify a temporal identifier for theNAL unit. The variable temporal identifier TemporalId may be specifiedas TemporalId=nuh_temporal_id_plus1-1.

Referring to FIG. 10, a general NAL unit syntax structure isillustrated. The NAL unit header two byte syntax of FIG. 9 is includedin the reference to nal_unit_header( ) of FIG. 10. The remainder of theNAL unit syntax primarily relates to the RBSP.

One existing technique for using the “nuh_reserved_zero_6bits” is tosignal scalable video coding information by partitioning the 6 bits ofthe nuh_reserved_zero_6bits into distinct bit fields, namely, one ormore of a dependency ID, a quality ID, a view ID, and a depth flag, eachof which refers to the identification of a different layer of thescalable coded video. Accordingly, the 6 bits indicate what layer of thescalable encoding technique this particular NAL unit belongs to. Then ina data payload, such as a video parameter set (“VPS”) extension syntax(“scalability_type”) as illustrated in FIG. 11, the information aboutthe layer is defined. The VPS extension syntax of FIG. 11 includes 4bits for scalability type (syntax element scalability type) whichspecifies the scalability types in use in the coded video sequence andthe dimensions signaled through layer_id_plus1 (or layer id) in the NALunit header. When the scalability type is equal to 0, the coded videosequence conforms to the base specification, thus layer_id_plus1 of allNAL units is equal to 0 and there are no NAL units belonging to anenhancement layer or view. Higher values of the scalability type areinterpreted as illustrated in FIG. 12.

The layer_id_dim_len[i] specifies the length, in bits, of the i-thscalability dimension ID. The sum of the values layer_id_dim_len[i] forall i values in the range of 0 to 7 is less than or equal to 6. Thevps_extension_byte_alignment_reserved_zero_bit is zero. Thevps_layer_id[i] specifies the value of layer_id of the i-th layer towhich the following layer dependency information applies. Thenum_direct_ref_layers[i] specifies the number of layers the i-th layerdirectly depends on. The ref_layer_id[i][j] identifies the j-th layerthe i-th layer directly depends on.

In this manner, the existing technique signals the scalabilityidentifiers in the NAL unit and in the video parameter set to allocatethe bits among the scalability types listed in FIG. 12. Then for eachscalability type, FIG. 12 defines how many dimensions are supported. Forexample, scalability type 1 has 2 dimensions (i.e., spatial andquality). For each of the dimensions, the layer_id_dim_len[i] definesthe number of bits allocated to each of these two dimensions, where thetotal sum of all the values of layer_id_dim_len[i] is less than or equalto 6, which is the number of bits in the nuh_reserved_zero_6bits of theNAL unit header. Thus, in combination the technique identifies whichtypes of scalability is in use and how the 6 bits of the NAL unit headerare allocated among the scalability.

While such a fixed combination of different scalability dimensions, asillustrated in FIG. 12, is suitable for many applications there aredesirable combinations which are not included. Referring to FIG. 13, amodified video parameter set extension syntax specifies a scalabilitytype for each bit in the nuh_reserved_zero_6bits syntax element. Thevps_extension_byte_alignment_reserved_zero bit is set to 0. Themax_num_layers_minus1_bits indicates the total number of bits used forthe syntax element in the first two bytes of the NAL unit header in FIG.9 referred to as layer_id plus1 or nuh_reserved_zero_6bits. Thescalability_map[i] specifies the scalability type for each bit in thelayer_id_plus1 syntax element. In some case the layer_id_plus1 sytaxelement may be instead called nuh_reserved_zero_6bits orreserved_zero_6bits syntax element. The scalability map for all the bitsof the syntax element layer_id_plus1 together specifies the scalabilityin use in the coded video sequence. The actual value of the identifierfor each of the scalability types is signaled through thosecorresponding bits in the layer_id_plus1 (nuh_reserved_zero_6bits) fieldin the NAL unit header. When scalability_map[i] is equal to 0 for allvalues of i, the coded video sequence conforms to the basespecification, thus layer_id_plus1 value of NAL units is equal to 0 andthere are no NAL units belonging to an enhancement layer or view. Thevps_layer_id[i] specifies the value of layer_id of the i-th layer towhich the following layer dependency information applies. Thenum_direct_ref_layers[i] specifies the number of layers the i-th layerdirectly depends on. The ref_layer_id[i][j] identifies the j-th layerthe i-th layer directly depends on.

Higher values of scalability_map[i] are interpreted as shown in FIG. 14.The scalability map [i] includes the scalability dimensions of (0) none;(1) spatial; (2) quality; (3) depth; (4) multiview; (5) unspecified; (6)reserved; and (7) reserved.

Therefore each bit in the NAL unit header is interpreted based on the 3bits in the video parameter set of what is the scalability dimension(e.g., none, spatial, quality, depth, multiview, unspecified, reserved).For example, to signal that all the bits in layer_id_plus1 correspond tospatial scalability, the scalability_map values in the VPS may be codedas 001 001 001 001 001 001 for the 6 bits of the NAL unit header. Alsofor example, to signal that 3 bits in layer_id_plus1 correspond tospatial scalability and 3 bits correspond to quality scalability, thescalability_map values in the VPS may be coded as 001 001 001 010 010010 for the 6 bits of the NAL Unit header.

Referring to FIG. 15, another embodiment includes the video parameterset signaling the number of scalability dimensions in the 6 bits of theNAL unit header using the num_scalability_dimensions_minus 1. Thenum_scalability_dimensions_minus1 plus 1 indicates the number ofscalability dimensions signaled through the layer_id_plus1;nuh_reserved_zero 6bits; and/or reserved_zero_6bits syntax elements. Thescalability map[i] has the same semantics as described above in relationto FIG. 13. The num_bits_for_scalability_map[i] specifies the length inbits for the i'th scalability dimension. The sum of all of thenum_bits_for scalability_map[i] for i=0, . . .num_scalability_dimensions_minus1 is equal to six (or otherwise equal tothe number of bits used for layer_id_plus1; vps_reserved_zero_6bits;max_num_layers_minus1; reserved_zero_6bits; nuh_reserved_zero_6bitssyntax elements).

With respect to FIG. 13 and FIG. 15 other variations may be used, ifdesired. In one embodiment for example, the scalability_map[i] may besignaled with u(4) (or u(n) with n>3 or n<3). In this case the highervalues of scalability_map[i] may be specified as reserved for bitstreamsconforming to a particular profile of the video technique. For example,scalability map values 6 . . . 15 may be specified as ‘reserved’ whensignaling scalability_map[i] with u(4). In another embodiment forexample, scalability_map[i] maybe signaled with ue(v) or some othercoding scheme. In another embodiment for example, a restriction may bespecified such that the scalability_map[i] values are arranged inmonotonic non decreasing (or non-increasing) order. This results invarious scalability dimension fields in the layer_id_plus1 field in NALunit header being contiguous.

Another existing technique for signaling the scalable video coding usingthe “layer_id_plus 1” or “nuh_reserved_zero_6bits” syntax element is tomap the layer_id_plus l in the NAL unit header to a layer_identificationby signaling a general lookup table in the video parameter set.Referring to FIG. 16, the existing technique includes a video parameterset that specifies the number of dimension types and dimensionidentifications for the i-th layer of the lookup table. In particular,the vps_extension_byte_alignment_reserved_zero_bit is zero. Thenum_dimensions_minus1[i] plus 1 specifies the number of dimension types(dimension_type[i][j]) and dimension identifiers (dimension_id[i][j])for the i-th layer. The dimension_type[i][j] specifies the j-thscalability dimension type of the i-th layer, which has layer_id orlayer_id_plus 1 equal to i, as specified in FIG. 17. As illustrated inFIG. 17, the dimensions that are identified include of (0) view orderidx; (1) depth flag; (2) dependency ID; (3) quality ID; (4)-(15)reserved. The dimension id[i][j] specifies the identifier of the j-thscalability dimension type of the i-th layer, which when not present isinferred to be 0. The num_direct_ref_layers[i] specifies the number oflayers the i-th layer directly depends on. The ref_layer_id[i][j]identifies the j-th layer the i-th layer directly depends on.Unfortunately, the proposed embodiment illustrated in FIG. 16 results inan unwieldy large lookup table.

Referring to FIG. 18, a modified video parameter set extension includesa scalability mask that is used in combination with a scalabilitydimension. The scalability_mask signals a pattern of 0 and 1 bits witheach bit corresponding to one scalability dimension as indicated by thescalability map syntax of FIG. 19. A value of 1 for a particularscalability dimension indicates that this scalability dimension ispresent in this layer (i'th layer). A value of 0 for a particularscalability dimension indicates that this scalability dimension is notpresent in this layer (i'th layer). For example, a set of bits of00100000 refers to quality scalability. The actual identifier value ofthe particular scalability dimension that is present is indicated by thescalability_id[j] value signaled. The values of num_scalability_types[i]is equal to the sum of number of bits in the scalability_mask havingvalue of 1. Thus

${{num\_ scalability}{{\_ types}\lbrack i\rbrack}} = {\sum\limits_{k = 0}^{7}\; {{{scalability\_ mask}\lbrack i\rbrack}{(k).}}}$

The scalability_id[j] indicates the j-th scalability dimension'sidentifier value for the type of scalability values that are signaled bythe scalability_mask value.

Referring to FIG. 20, a modification of FIG. 18, includes thescalability mask being signaled outside the loop. This results in onecommon mask for each layer identification. Referring to FIG. 21, in thismodification a corresponding exemplary video parameter set may includethe scalable identification with the scalability mask not beingincluded. In this case the syntax element scalable_id[j] has sameinterpretation as the syntax element scalability_id[j] in FIG. 18.

Referring to FIG. 22 a modification of FIG. 18 includes additionalsource information indicators being signaled. Source scan typeinformation is signaled by source_scan_type_info_idc[i] for each layeri. source_scan_type_info_idc[i] equal to 0 indicates that the layer ihas pictures whose scan type should be interpreted as interlaced scantype for all pictures. source_scan_type_info_idc[i] equal to 1 indicatesthat the layer i has pictures whose scan type should be interpreted asprogressive scan type for all pictures. source_scan_type_info_idc[i]equal to 2 indicates that the layer i has pictures whose scan type isunknown. source_scan_type_info_idc[i] equal to 3 indicates that thelayer i has pictures which are a mixture of scan type progressive andscan type interlaced. This is indicated in FIG. 23.

Source 2D/3D information is signaled by source_2d_3d_info_idc[i] foreach layer i. source_2d_3d_info_idc[i] equal to 1 indicates that allpictures for layer i are packed in a frame compatible 3D format.source_2d_3d_info_idc[i] equal to 0 indicates that all pictures forlayer i are 2D frames i.e. none of them are packed in a frame compatible3D format. source_2d_3d_info_idc[i] equal to 2 indicates that the layeri has pictures whose frame packing arrangement is unknown.source_2d_3d_info_idc[i] equal to 3 indicates that the layer i haspictures which are a mixture of frame compatible 3D and 2D frames. Thisis indicated in FIG. 24.

In another variant the syntax elements source_scan_type_info_idc[i]and/or source_2d_3d_info_idc[i] may be signaled with u(1) or u(n) withn>3. Or it may be signaled with ue(v) or some other coding method.

In another variant embodiment different names may be used for the syntaxelements than the names source_(—) scan_info_idc[i] and source2d_3d_info_idc[i].

In another embodiment syntax elements source_scan_type_info_idc[i]and/or source_2d_3d_info_idc[i] may be signaled in Sequence ParameterSet (SPS)/Picture Parameter Set (PPS)/ Slice Header or other nortmativepart of the bitstream.

In another embodiment the syntax elements source_scan_type_info_idc andsource 2d_3d_info_idc may be signaled only once if they are the same forall the layers [i]. In this case the syntax elements are signaled onceas source_scan_type_info_idc and source_2d_3d_info_idc without the layerindex i. This is shown in FIG. 25.

In another embodiment the syntax elements source_scan_type_info_idcand/or source_2d_3d_info_idc may be signaled also for the base layer.This is shown in FIG. 26.

In another embodiment the syntax elements source_scan_type_info_idc[i]and source2d_3d_info_idc may be signaled at another place in VPSdifferent than the one shown in FIG.22, FIG. 25, FIG. 26. It may besignaled in the base VPS and/or in the vps_extension( ).

In a variant embodiment the syntax elements source_scan_type_info_idc[i]and/or source_2d_3d_info_idc[i] may be signaled with u(3) instead ofu(2). The values for them are shown respectively in FIG. 27 and FIG. 28.

Also in FIG. 22 and FIG. 25 avc_base_codec_(—) flag equal to 1 specifiesthat the base layer conforms to Rec. ITU-T H.264|ISO/IEC 14496-10, andavc_base_codec_flag equal to 1 specifies to HEVC.vps_nuh_layer_id_present_flag indicates if layer_id_in_nuh[i] variablewhich signals the value of layer_id in NAL unit header is signaled.

In another embodiment one or more of the syntax elementsscalability_mask[i], scalability_mask, scalability id[j] may be signaledusing different number of bits than u(8). For example they could besignaled with u(16) (or u(n) with n>8 or n<8). In another embodiment oneor more of these syntax element could be signaled with ue(v). In anotherembodiment the scalability_mask may be signaled in the NAL unit headerin layer_id_plus1; vps_reserved_zero_6bits; max_num_layers_minus1;reserved_zero_6bits; and/or nuh_reserved_zero_6bits syntax elements. Insome embodiments the system may do this only for VPS NAL units, or onlyfor non-VPS NAL units, or for all NAL units. In yet another embodimentscalability_mask may be signaled per picture anywhere in the bitstream.For example it may be signaled in slice header, picture parameter set,video parameter set, or any other parameter set or any other normativepart of the bistream.

It should be noted that FIGS. 13, 15, 18, 20, 21, 22 and correspondingdescription refer to 6 bits since the syntax elementnuh_reserved_zero_6bits or layer_id_plus1 in NAL unit header of FIG. 9has 6 bits. However all the above description can be suitably modifiedif that syntax element used a different number of bits than 6 bits. Forexample if that syntax element (nuh_reserved_zero_6bits orlayer_id_plus1) instead used 9 bits then in FIG. 13 the value ofmax_num_layer_minus1 bits will be 9 and the scalability_map[i] will besignaled for each of the 9 bits instead of 6 bits.

The terms and expressions which have been employed in the foregoingspecification are used therein as terms of description and not oflimitation, and there is no intention, in the use of such terms andexpressions, of excluding equivalents of the features shown anddescribed or portions thereof, it being recognized that the scope of theinvention is defined and limited only by the claims which follow.

1.-12. (canceled)
 13. A method for decoding a coded video sequencecomprising: receiving a first bitstream representative of a base layerof a coded video picture; receiving a second bitstream representative ofan enhancement layer of the coded video picture; receiving first sourcescan type indicator for the base layer and a second source scan typeindicator for the enhancement layer, each of the first and second sourcescan type indicators indicating a source scan type for the correspondinglayer; decoding the base layer based on the first source scan typeindicator; decoding the enhancement layer based on the second sourcescan type indicator; and generating a decoded video sequence based onthe base layer and the enhancement layer.
 14. The method of claim 13,wherein the first source type indicator indicates an interlaced scantype for the base layer.
 15. The method of claim 13, wherein the firstsource type indicator indicates a progressive scan type for the baselayer.
 16. The method of claim 13, wherein the first source typeindicator indicates an unknown scan type for the base layer.
 17. Themethod of claim 13, wherein the second source type indicator indicatesan interlaced scan type for the enhancement layer.
 18. The method ofclaim 13, wherein the second source type indicator indicates aprogressive scan type for the enhancement layer.
 19. The method of claim13, wherein the second source type indicator indicates an unknown scantype for the enhancement layer.